A well-defined HDX-MS experiment
This vignette describeds how to analyse time-resolved differential HDX-MS
experiments. The key elements are at least two conditions i.e. apo + antibody,
apo + small molecule or protein closed + protien open, etc. The experiment can
be replicated, though if there are sufficient time points analysed (>=3) then
occasionally signficant results can be obtained. The data provided should be
centroid-centric data. This package does not yet support analysis straight
from raw spectra. Typically this will be provided as a .csv from tools such as
dynamiX or HDExaminer.
Main elements of the package
The package relies of Bioconductor infrastructure so that it integrates with
other data types and can benefit from advantages in other fields of mass-spectrometry.
There are package specific object, classes and methods but importantly there is
reuse of classes found in quantitative proteomics data, mainly the QFeatures
object which extends the SummarisedExperiment class for mass spectrometry data.
The focus of this package is on testing and visualisation of the testing results.
Data
We will begin with a structural variant experiment in which MHP and a structural
variant were mixed in different proportions. HDX-MS was performed on these samples
and we expect to see reproducible but subtle differences. We first load the data
from the package and it is .csv format.
MBPpath <- system.file("extdata", "MBP.csv", package = "hdxstats")
We can now read in the .csv file and have a quick look at the .csv.
MBP <- read.csv(MBPpath)
head(MBP) # have a look
## hx_sample pep_start pep_end pep_sequence pep_charge d confidence score
## 1 10% 19 30 VIWINGDKGYNG 2 2.120 medium 0.8686
## 2 10% 19 30 VIWINGDKGYNG 2 2.146 medium 0.8173
## 3 10% 19 30 VIWINGDKGYNG 2 2.143 medium 0.8839
## 4 10% 19 30 VIWINGDKGYNG 2 NA <NA> NA
## 5 10% 19 30 VIWINGDKGYNG 2 NA <NA> NA
## 6 10% 19 30 VIWINGDKGYNG 2 NA <NA> NA
## hx_time time_unit replicate_cnt
## 1 30 s 1
## 2 30 s 2
## 3 30 s 3
## 4 30 s 4
## 5 30 s 5
## 6 30 s 6
length(unique(MBP$pep_sequence)) # peptide sequences
## [1] 115
Let us have a quick visualisation of some the data so that we can see some of
the features
filter(MBP, pep_sequence == unique(MBP$pep_sequence[1]), pep_charge == 2) %>%
ggplot(aes(x = hx_time, y = d, group = factor(replicate_cnt),
color = factor(hx_sample,
unique(MBP$hx_sample)[c(7,5,1,2,3,4,6)]))) +
theme_classic() + geom_point(size = 2) +
scale_color_manual(values = brewer.pal(n = 7, name = "Set2")) +
labs(color = "experiment", x = "Deuterium Exposure", y = "Deuterium incoperation")
## Warning: Removed 96 rows containing missing values (geom_point).
We can see that the units of the time dimension are in seconds and that
Deuterium incoperation has been normalized into Daltons.
Parsing to an object of class QFeatures
Working from a .csv is likely to cause issues downstream. Indeed, we run
the risk of accidently changing the data or corrupting the file in some way.
Secondly, all .csvs will be formatted slightly different and so making extensible
tools for these files will be inefficient. Furthermore, working with a generic
class used in other mass-spectrometry fields can speed up analysis and adoption
of new methods. We will work the class QFeatures from the QFeatures class
as it is a powerful and scalable way to store quantitative mass-spectrometry data.
Firstly, the data is storted in long format rather than wide format. We first
switch the data to wide format.
MBP_wide <- pivot_wider(data.frame(MBP),
values_from = d,
names_from = c("hx_time", "replicate_cnt", "hx_sample"),
id_cols = c("pep_sequence", "pep_charge"))
head(MBP_wide)
## # A tibble: 6 x 198
## pep_sequence pep_charge `30_1_10%` `30_2_10%` `30_3_10%` `30_4_10%` `30_5_10%`
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 VIWINGDKGYNG 2 2.12 2.15 2.14 NA NA
## 2 VIWINGDKGYN~ 2 2.12 2.12 2.11 NA NA
## 3 GDKGYNGLAEVG 3 0.552 0.555 0.553 NA NA
## 4 LAEVGKKFEKD~ 4 2.41 2.36 2.42 NA NA
## 5 AEVGKKFEKDT~ 4 0.458 0.425 0.573 NA NA
## 6 TVEHPDKL 3 1.43 1.44 1.44 NA NA
## # ... with 191 more variables: 30_6_10% <dbl>, 30_7_10% <dbl>, 240_1_10% <dbl>,
## # 240_2_10% <dbl>, 240_3_10% <dbl>, 240_4_10% <dbl>, 240_5_10% <dbl>,
## # 240_6_10% <dbl>, 240_7_10% <dbl>, 1800_1_10% <dbl>, 1800_2_10% <dbl>,
## # 1800_3_10% <dbl>, 1800_4_10% <dbl>, 1800_5_10% <dbl>, 1800_6_10% <dbl>,
## # 1800_7_10% <dbl>, 14400_1_10% <dbl>, 14400_2_10% <dbl>, 14400_3_10% <dbl>,
## # 14400_4_10% <dbl>, 14400_5_10% <dbl>, 14400_6_10% <dbl>, 14400_7_10% <dbl>,
## # 30_1_15% <dbl>, 30_2_15% <dbl>, 30_3_15% <dbl>, 30_4_15% <dbl>, ...
We notice that there are many columns with NAs. The follow code chunk removes
these columns.
MBP_wide <- MBP_wide[, colSums(is.na(MBP_wide)) != nrow(MBP_wide)]
We also note that the colnames are not very informative. We are going to format
in a very specific way so that later functions can automatically infer the design
from the column names. We provide in the format X(time)rep(replicate)cond(condition)
colnames(MBP_wide)[-c(1,2)]
## [1] "30_1_10%" "30_2_10%" "30_3_10%" "240_1_10%"
## [5] "240_2_10%" "240_3_10%" "1800_1_10%" "1800_2_10%"
## [9] "1800_3_10%" "14400_1_10%" "14400_2_10%" "14400_3_10%"
## [13] "30_1_15%" "30_2_15%" "30_3_15%" "240_1_15%"
## [17] "240_2_15%" "240_3_15%" "1800_1_15%" "1800_2_15%"
## [21] "1800_3_15%" "14400_1_15%" "14400_2_15%" "14400_3_15%"
## [25] "30_1_20%" "30_2_20%" "30_3_20%" "240_1_20%"
## [29] "240_2_20%" "240_3_20%" "1800_1_20%" "1800_2_20%"
## [33] "1800_3_20%" "14400_1_20%" "14400_2_20%" "14400_3_20%"
## [37] "30_1_25%" "30_2_25%" "30_3_25%" "240_1_25%"
## [41] "240_2_25%" "240_3_25%" "1800_1_25%" "1800_2_25%"
## [45] "1800_3_25%" "14400_1_25%" "14400_2_25%" "14400_3_25%"
## [49] "30_1_5%" "30_2_5%" "30_3_5%" "240_1_5%"
## [53] "240_2_5%" "240_3_5%" "1800_1_5%" "1800_2_5%"
## [57] "1800_3_5%" "14400_1_5%" "14400_2_5%" "14400_3_5%"
## [61] "30_1_W169G" "30_2_W169G" "30_3_W169G" "240_1_W169G"
## [65] "240_2_W169G" "240_3_W169G" "1800_1_W169G" "1800_2_W169G"
## [69] "1800_3_W169G" "14400_1_W169G" "14400_2_W169G" "14400_3_W169G"
## [73] "30_1_WT Null" "30_2_WT Null" "30_3_WT Null" "30_4_WT Null"
## [77] "30_5_WT Null" "30_6_WT Null" "30_7_WT Null" "240_1_WT Null"
## [81] "240_2_WT Null" "240_3_WT Null" "240_4_WT Null" "240_5_WT Null"
## [85] "240_6_WT Null" "240_7_WT Null" "1800_1_WT Null" "1800_2_WT Null"
## [89] "1800_3_WT Null" "1800_4_WT Null" "1800_5_WT Null" "1800_6_WT Null"
## [93] "1800_7_WT Null" "14400_1_WT Null" "14400_2_WT Null" "14400_3_WT Null"
## [97] "14400_4_WT Null" "14400_5_WT Null" "14400_6_WT Null" "14400_7_WT Null"
new.colnames <- gsub("0_", "0rep", paste0("X", colnames(MBP_wide)[-c(1,2)]))
new.colnames <- gsub("_", "cond", new.colnames)
# remove annoying % signs
new.colnames <- gsub("%", "", new.colnames)
# remove space (NULL could get confusing later and WT is clear)
new.colnames <- gsub(" .*", "", new.colnames)
new.colnames
## [1] "X30rep1cond10" "X30rep2cond10" "X30rep3cond10"
## [4] "X240rep1cond10" "X240rep2cond10" "X240rep3cond10"
## [7] "X1800rep1cond10" "X1800rep2cond10" "X1800rep3cond10"
## [10] "X14400rep1cond10" "X14400rep2cond10" "X14400rep3cond10"
## [13] "X30rep1cond15" "X30rep2cond15" "X30rep3cond15"
## [16] "X240rep1cond15" "X240rep2cond15" "X240rep3cond15"
## [19] "X1800rep1cond15" "X1800rep2cond15" "X1800rep3cond15"
## [22] "X14400rep1cond15" "X14400rep2cond15" "X14400rep3cond15"
## [25] "X30rep1cond20" "X30rep2cond20" "X30rep3cond20"
## [28] "X240rep1cond20" "X240rep2cond20" "X240rep3cond20"
## [31] "X1800rep1cond20" "X1800rep2cond20" "X1800rep3cond20"
## [34] "X14400rep1cond20" "X14400rep2cond20" "X14400rep3cond20"
## [37] "X30rep1cond25" "X30rep2cond25" "X30rep3cond25"
## [40] "X240rep1cond25" "X240rep2cond25" "X240rep3cond25"
## [43] "X1800rep1cond25" "X1800rep2cond25" "X1800rep3cond25"
## [46] "X14400rep1cond25" "X14400rep2cond25" "X14400rep3cond25"
## [49] "X30rep1cond5" "X30rep2cond5" "X30rep3cond5"
## [52] "X240rep1cond5" "X240rep2cond5" "X240rep3cond5"
## [55] "X1800rep1cond5" "X1800rep2cond5" "X1800rep3cond5"
## [58] "X14400rep1cond5" "X14400rep2cond5" "X14400rep3cond5"
## [61] "X30rep1condW169G" "X30rep2condW169G" "X30rep3condW169G"
## [64] "X240rep1condW169G" "X240rep2condW169G" "X240rep3condW169G"
## [67] "X1800rep1condW169G" "X1800rep2condW169G" "X1800rep3condW169G"
## [70] "X14400rep1condW169G" "X14400rep2condW169G" "X14400rep3condW169G"
## [73] "X30rep1condWT" "X30rep2condWT" "X30rep3condWT"
## [76] "X30rep4condWT" "X30rep5condWT" "X30rep6condWT"
## [79] "X30rep7condWT" "X240rep1condWT" "X240rep2condWT"
## [82] "X240rep3condWT" "X240rep4condWT" "X240rep5condWT"
## [85] "X240rep6condWT" "X240rep7condWT" "X1800rep1condWT"
## [88] "X1800rep2condWT" "X1800rep3condWT" "X1800rep4condWT"
## [91] "X1800rep5condWT" "X1800rep6condWT" "X1800rep7condWT"
## [94] "X14400rep1condWT" "X14400rep2condWT" "X14400rep3condWT"
## [97] "X14400rep4condWT" "X14400rep5condWT" "X14400rep6condWT"
## [100] "X14400rep7condWT"
We will now parse the data into an object of class QFeatures, we have provided
a function to assist with this in the package. If you want to do this yourself
use the readQFeatures function from the QFeatures package.
MBPqDF <- parseDeutData(object = DataFrame(MBP_wide),
design = new.colnames,
quantcol = 3:102)
Heatmap visualisations of HDX data
To help us get used to the QFeatures we show how to generate a heatmap
of these data from this object:
pheatmap(t(assay(MBPqDF)),
cluster_rows = FALSE,
cluster_cols = FALSE,
color = brewer.pal(n = 9, name = "BuPu"),
main = "Stuctural variant deuterium incoperation heatmap",
fontsize = 14,
legend_breaks = c(0, 2, 4, 6, 8, 10, 12, max(assay(MBPqDF))),
legend_labels = c("0", "2", "4", "6", "8","10", "12", "Incorporation"))
If you prefer to have the start-to-end residue numbers in the heatmap instead
you can change the plot as follows:
regions <- unique(MBP[,c("pep_start", "pep_end")])
xannot <- paste0("[", regions[,1], ",", regions[,2], "]")
pheatmap(t(assay(MBPqDF)),
cluster_rows = FALSE,
cluster_cols = FALSE,
color = brewer.pal(n = 9, name = "BuPu"),
main = "Stuctural variant deuterium incoperation heatmap",
fontsize = 14,
legend_breaks = c(0, 2, 4, 6, 8, 10, 12, max(assay(MBPqDF))),
legend_labels = c("0", "2", "4", "6", "8","10", "12", "Incorporation"),
labels_col = xannot)
It maybe useful to normalize HDX-MS data for either interpretation or
visualization purposes. We can normalize by the number of exchangeable amides
or by using back-exchange correction values. We first use percentage
incorporation as normalisation and visualise as a heatmap.
MBPqDF_norm1 <- normalisehdx(MBPqDF,
sequence = unique(MBP$pep_sequence),
method = "pc")
pheatmap(t(assay(MBPqDF_norm1)),
cluster_rows = FALSE,
cluster_cols = FALSE,
color = brewer.pal(n = 9, name = "BuPu"),
main = "Stuctural variant deuterium incoperation heatmap normalised",
fontsize = 14,
legend_breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1, 1.2),
legend_labels = c("0", "0.2", "0.4", "0.6", "0.8","1", "Incorporation"),
labels_col = xannot)
Now, we demonstrate a back-exchange correction calculation. The
back-exchange value are fictious by the code chunk below demonstrates how
to set this up.
# made-up correction factor
correction <- (exchangeableAmides(unique(MBP$pep_sequence)) + 1) * 0.9
MBPqDF_norm2 <- normalisehdx(MBPqDF,
sequence = unique(MBP$pep_sequence),
method = "bc",
correction = correction)
pheatmap(t(assay(MBPqDF_norm2)),
cluster_rows = FALSE,
cluster_cols = FALSE,
color = brewer.pal(n = 9, name = "BuPu"),
main = "Stuctural variant deuterium incoperation heatmap normalised",
fontsize = 14,
legend_breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1, 1.2),
legend_labels = c("0", "0.2", "0.4", "0.6", "0.8","1", "Incorporation"),
labels_col = xannot)
